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Abstract: We review the advancement of nonstationary time series analy- 
sis from the perspective of Cowles Commission structural equation approach. 
We argue that despite the rich repertoire nonstationary time series analysis 
provides to analyze how do variables respond dynamically to shocks through 
the decomposition of a dynamic system into long-run and short-run relations, 
nonstationarity does not invalid the classical concerns of structural equation 
modeling — identification and simultaneity bias. The same rank condition 
for identification holds for stationary and nonstationary data and some sort 
of instrumental variable estimators will have to be employed to yield consis- 
tency. However, nonstationarity does raise issues of inference if the rank of 
cointegration or direction of nonstationarity is not known a priori. The usual 
test statistics may not be chi-square distributed because of the presence of 
unit roots distributions. Classical instrumental variable estimators have to be 
modified to ensure valid inference. 



1. Introduction 

Let {Wf} be a sequence of time series observations of random variables. Multi- 
variate vector autoregressive model (VAR) has been suggested as a useful tool to 
summarize the information contained in the data and to generate predictions (e.g. 



Hsiao [21|, 122], Sims [50|). These models treat all variables as joint dependent and 
treat as a function of its past values, w^_j. On the other hand, Cowles Com- 
mission approach assumes each equation in the system describes a behavioral or 
technological relations. An essential element of the Cowles Commission approach 
is to decompose into G endogenous variables, y^, and K exogenous variables, 
Xf.,w'f. = {y'^,Xf),G + K ~ m. The value of endogenous variables are determined 
by the simultaneous interaction of the behavioral, technological or institutional re- 
lations in the model given the value of the exogenous variables, Xj, and shock of the 
system (say, ej. The value of is assumed to be determined by the forces outside 
of the model (e.g. Koopmans and Hood [l^). The Cowles Commission structural 
equation approach is also referred as a structural equations model (SEM). It has 
wide applications in education, psychology and econometrics, etc. (e.g. Browne and 
Arminger Hood and Koopmans [13], Muthen 3^, i^]. Yuan and Rentier [59|). 
In this paper we will only focus on the aspects related to the time series analysis 
of a SEM. 

Since the observed data can only provide information on conditional distribution 
of y given past values of y . and current and past values of Xt-j, there is an issue of 
if it is possible to infer from the data the true data generating process for the SEMs, 
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which is referred to as an identification issue. Another issue for the SEMs is because 
of the joint dependency of y^, the regressors of an equation are correlated with the 
error (shock) of an equation which violates the condition for the regression method 
to be consistent. This is referred to as simultaneity bias issue. The theory and 
statistical properties of SEMs are well developed for stationary data (e.g. Amemiya 
0, Intriligator, Boskin and Hsiao [30|V 

Nelson and Plosser have shown that many economic and financial data con- 
tain unit roots, namely, most are integrated of order 1 or 2, 1(1) or 1(2). Theories 
for the time series analysis with unit roots have been derived by Anderson Q , Chan 
and Wei 0], Johansen [3l|,[3^, Phillips [H], Phillips and Durlauf [46^, Sims, Stock 
and Watson [5l|, Tiao and Tsay [57], etc. Among the major findings are that (i) 
Wf may be cointegrated in the sense that a linear combination of I{d) variables 
may be of order I{d — c), where d and c are positive numbers, say 1 (Granger and 



Weiss jl4|, Engle and Granger Tiao and Box 54]); (ii) "Since these models 



(VAR) don't dichotomize variables into "endogenous" and "exogenous," the exclu- 
sion restrictions used to identify traditional simultaneous equations models make 
little sense" (Watson 58^); (iii) Time series regressions with integrated variables can 
behave very differently from those with stationary variables. Some of the estimated 
coefficients converge to their true values at the speed of ^/T and are asymptoti- 
cally normally distributed. Some converge to the true values at the speed of T but 
have non-normal asymptotic distribution, and are asymptotically biased. Hence the 
Wald test statistics under the null may not be approximated by chi-square distri- 
butions (Chan and Wei [7], Sims, Stock and Watson [5l[, Tsay and Tiao [57,]); (iv) 
Even though the J(l) regressors may be correlated with the errors, the least squares 
regression consistently estimates the cointegrating relation, hence the simultaneity 
bias issues may be ignored (Phillips and Durlauf [46.] , Stock [52,] ) . 

In this paper we hope to review the recent advances in nonstationary time series 
analysis from the perspective of Cowles Commission Structural equation approach. 
In section 2 we discuss the relationships between a vector autoregressive model 
(VAR), a structural vector autoregressive model (SVAR), and Cowles Commission 
structural equations model (SEM). Section 3 discusses issues of estimating VAR 
with integrated variables. Section 4 discusses the least squares and instrumental 
variable estimators, in particular, the two stage least squares estimator (2SLS) for 
a SVAR. Section 5 discusses the modified and lag order augmented 2SLS estimators 
for SVAR. Conclusions are in Section 6. 



2. Vector autoregression, structural vector autoregression and 
structural equations model 

For ease of exposition, we shall assume that all elements of are 1(1) processes. 
We assume that are generated by the following p-ih order structural vector 
autoregressive process without intercept terms 

(2.1) A{L)w, = 

where A{L) — Aq-\- AiL -t- A2L^ + • • • + ApL^. We assume that initial observations 
Wq, 'w_i, . . . , w_p are available and 

A.l: Aq is nonsingular and Aq ^ Im, where Im denotes an m rowed identity matrix. 

'^The introduciton of intercept terms complicates algebraic manipulation without changing the 
basic message. For detail, see [28|. 
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A. 2: The roots of — are either 1 or outside the unit circle. 

A. 3: The mx 1 error or innovation vector is independently, identically distributed 

(i.i.d.) with mean zero, nonsingular covariance matrix and finite fourth 

cumulants. 

Premultiplying_j4Q ^ to (2.1) yields the conventional VAR model of Johansen 
[31, 32], PhiUips lHI, Sims [s^, Sims, Stock and Watson Tsay and Tiao [57], 
etc., 



(2.2) 



where Ho 



1, . . . ,p, and U( 



The difference between (2.1) 
and (2.2) is that each equation in the former is supposed to describe a behavioral 
or technological relation while the latter is a reduced form relation. Eq. (2.2) is use- 
ful for generating prediction, but cannot be used for structural or policy analysis. 
For instance, wu, W2t, w^t, w^t may denote the price and quantity of a product, per 
capita income and raw material price, respectively. The first and second equations 
describe a demand relation which has quantity inversely related to price and posi- 
tively related to income, and a supply relation which has price positively related to 
quantity and raw material price, respectively. Only (2.1) can provide information 
on demand and supply price elasticities but not (2.2). Equation (2.2) can only yield 
expected value of price and quantity given past Wt-j- 

Let A — [Aq, ^1, . . . , Ap] and define a (p -I- l)r7i-dimensional nonsingular matrix 
M as 



(2.3) 



M 



Q /„, 
Q Q 

. 



.../„ 
.0. /„ 



Postmultiplying A by M yields an error-correction representation of (2.1), 

v-l 



(2.4) 



where y = ^* - ELo ^^^3 = 0. 1- • -P- Let A* = [Al, . . . , A;] = [^t, ^;], 

then A* = AM. The coefficient matrices A'^ and A* provide the implied short-run 
dynamics and long-run relations of the system (2.1) as defined in 26]H 

Similarly, we can post-multiply (2.2) by M to yield an error- correction represen- 
tation of the reduced form (2.2) 



(2.5) 



n! 



t-p ' 



where 11^ = Y.l=i Hi - /m- 

In this paper we are concerned with statistical inference of (2.1). If the roots of 
]A(L)| = are all outside the unit circle, is stationary. It is well known that the 
least squares estimator (LS) is inconsistent. The 2SLS and 3SLS using lagged as 
instruments are consistent and asymptotically normally distributed (e.g. Amemiya 



^ The long-run and short-run dichotomization defined here is derived from (2.1). They are 
difi'erent from the those impHed by Granger and Lin |13|| , Johansen |3ll [3^ or Pesaran, Shin and 
Smith m, etc. 
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56 1 that A4 ensures 
ll|) apphes, so that 



Mahnvaud [si]). Therefore jWe shall assume that at least one root of = 

is equal to 1. More specifically|f| 

A4:(a) A* = qP' (or E* = a* [3* ) where g and (3 (or g* and (3*) are m x r 
matrices of full column rank r, 0<r<TO— 1 
(b) q'j_Ji3^ or {q*[j*(3*J is nonsingular, where J = Z]^=o^j, (or J* = 
^^^Q n*), gj^ and /J^^ (or q*j_ and are m x (m — r) matrices of full 

column rank such that g^g = = /3'^(3, (or q*j_q* = Q = (If f = 0, 

then we take g^ = Im — (3 

Under A1-A4, w^ has r cointeqrating vectors (the columns of 0) and m — r unit 
roots. As shown by Johansen [31, 32] and Toda and Phillips 
that the Granger representation theorem (Engle and Granger 
Vy/t is stationary, l3'wf is stationary, and is an 1(1) process when r < m 

The cointegrating vectors /3 provide information on the "long-run" or "equilib- 
rium" state in which a dynamic system tends to converge over time after any of the 
variables in the system being perturbed by a shock, g transmits the deviation from 
such long-run relations, — (3'w^, into each of w^, and A\ provides information 
on how soon such "equilibrium" is restored. In economics, the existence of long-run 
relationships and strength of attraction to such a state depends on the actions of a 
market or on government intervention. In this sense, the concept of cointegration has 
been applied in a variety of economic models including the relationships between 
capital and output; real wages and labor productivity; nominal exchange rate and 
relative prices, consumption and disposable income, long- and short-term interest 
rates, money velocity and interest rates, price of shares and dividends, production 
and sales, etc. (e.g. Banerjee, Dolado, Galbraith and Hendry Hsiao, Shen and 
Fujiki ^293, King, Plosser, Stock and Watson jsll). 

Since the data only provide information of the conditional density of given 
past values of'w^_j,j = 1, . . . , there is an issue of if it is possible to derive (2.1) from 
(2.2) (or (2.4) from (2.5)). Without prior restrictions, there can be infinitely many 
different SVAR that yield identical (2.2). To see this we note that premultiplying 
(2.1) by any nonsingular constant matrix F yields 

(2.6) ioWt + ^iyt-iH \- ApWt^p = e^, 

where Aj = FAj,£^ — Fe^. Equations (2.1) and (2.5) yield identical (2.2) since 
Aq^Aj = A^^F^^FAj = Uj,yt = A^^e^. = A^^F^^Fe^ = 1^ other words, 

(2.1) and (2.5) are observationally equivalent. 

An equation in (2.1) is identified if and only if the .g-th row of admissible trans- 
formation matrix F = {f'^) takes the form that apart from the gth element being a 

nonzero constant, the rest are all zeros, i.e., f'^ — {0, . . . ,0, fgg,0, . . . ,0) (e.g. Hsiao 
[23!). The transformation matrix F is admissible if and only if (2.1) and (2.6) sat- 
isfy the same prior restrictions. Suppose that the g-th equation of (2.1) satisfies the 
prior restrictions Qg^g = Q', where q'g denotes the g-th row of A and $g denotes a 
{p-\-l)m X Rg matrix with known elements. Let = M~^$g, the existence of prior 

restrictions a'g^g — 0' is equivalent to the existence of prior restrictions g* $* =0', 
where g* is the g-th row of A*. It is shown by Hsiao [26[ that 



^Since 11* = Ag ^A*, A4 implies that (a) EI* = a* (3* , where a* and /3* are m X r matrices of 
full column rank r, < r < m — 1, and (b) a*^ J* P*^ is nonsingular, where J* = ^^_q n*. 
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Theorem 2.1. Suppose that the g-th equation of (2.1) is subject to the prior re- 
strictions a/g^g — 0\ A necessary and sufficient condition for the identification of 
the g-th equation of (2.1) or (2.4) is that 

(2.7) rank(A$g) = m - 1, 



(2.8) rank(^*4'*) = m- 1. 

Let = (y^,X(), where y'^ and are 1 x G and 1 x K, respectively, and 
G + K = m. Let 



AiL) 



Aii{L) Ai2iL) 
A2i(L) A22iL) 



and = (git, £2*) be the conformable partitions. Cowles Commission decomposi- 
tion of into joint dependent variable variables y^ and exogenous variables is 
equivalent to imposing the prior restrictions (Zellner and Palm f60|). 

(2.9) A2i{L) = and Ee^^e2t ^ Q. 

The prior restrictions (2.9) restrict the admissible transformation matrix F to be 
block diagonal (e.g. Hsiao [1^). Therefore, 

Corollary 2.1. Under (2.9) and a^^g — 0', a necessary and sufficient condition 
for the identification of the g-th equation for g < G is 

(2.10) rank[(An A,2)<^g] = G - 1, 

where An and A12 are conformable partitions of A. 

The identification condition (2.7) or (2.8) does not require any prior knowledge 
of the direction of nonstationarity or the rank of cointcgration. As a matter of 
fact many macroeconometric models are identified without any prior knowledge of 
location of unit roots or rank of cointegration, (e.g. the Klein 1341 i nterwar model 
and the large scale Wharton quarterly model (Klein and Evans [35|). Of course, if 
such information is available, it can improve the efficiency of system estimators and 
simplify the issues of inference considerably (e.g. King, Plosser, Stock and Watson 

H). 



3. Inference in VAR (or reduced form) 

Consider the g-ih. equation of (2.2), 

where Wg is the T x 1 vector of the g-th element of w^,Wgt,X — (VK_i, . . . , W-p), is 
the T X mp vector of Wf_i, . . . , Wt-pi Hg is the corresponding vector of coefficients, 
and Vg is the T x 1 vector of the g-th element of , Vgt . 

Rewrite (3.1) in terms of linearly independent /(O) and full rank /(I) regressors 
and X2, respectively, by postmultiplying a nonsingular transformation matrix 
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to X0 we have 

(3.2) =X*rg+Vg 

where tt* — M~^iig = (TTgi, 7rg2)'- The least squares estmiator of (3.1) is equal to 
Mx times the least squares estimator of (3.2), 

i^^{X'X)-\X'w^) 

(3.3) = Mx{X*' X*)-^ X*' w g 

= AUrg + {X*'x*)-^X*'vg]. 

The statistical properties of (3.3) can be derived by making use of the funda- 
mental functional central limit theorems proved by Chan and Wei Q, Phillips and 
Durlauf [H], etc.: 

Theorem 3.1. Let rj^ be an m x 1 vector of random variables with Eii]^ \ rj^ ^, 
. . . , ) = 0, E{rjrj'^ \ rj^ ^, . . . , ) = and bounded fourth moments. Let F{L) = 

Ljlo P]^^ "''^d. G{L) = Y.'jLo '^3^' Sj^o J I \< °° '^"'^ ^%oj I l< 
Let ^ — X)s=i '^s' B(t) denote an m x 1 dimensional Brownian motion 

process. 

Then 

(a) T-i/2 J2l, F{L)^^ 7V(0, F(1)F(1)'), 

(b) T-^Y.tii,.,Vt=^ SB{r)dB{ry, 

(c) ELi QfIlX]' =^ m' + / B{r)dB{ryF{\)\ 

(d) T~iELi[^(^)!/J[G(i)ryJ' — E.^o^.G;, 

(e) r-2Sf^^^,C;^/i?(r-)i?(0'rfr-, 

where to simplify notation jj^ is denoted by / and — > and =J> denote convergence 
in probability and distribution of the associated probability measure, respectively. 
Making use of theorem 3.1, it follows that 

Theorem 3.2. Under Assumptions A.l - A. 4, as T — > oo, 

(3.4) Vrirgi - TT^i) =^ iv(o, <A4*,,J, 



(3.5) 



Ti7Ll2 - K,2) =^ (/ S.-(r)i?.;(r)'dr^ 



Bx.{r)dB,^{r) . 



where Af*^^^ — plim ^ X]t=i -^-it^if Moreover, (3.4) and (3.5) are asymptotically 
independent. 



*Such a transformation always exist. However, it does not need to be known a priori. The use 
of (3.2) is to facilitate the derivation of statistical properties of the estimators of (3.1) or (2.1). 
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The least squares estimator (3.3) is a linear combination of -k^i and n*g2. Its 
limiting distribution is determined by the limiting distribution of the slower rate 
of n* included. Since the limiting distribution of n*g2 is nonstandard and involves a 
matrix unit distribution, the usual Wald test statistic under the null may not be ap- 
proximated by the chi-square distribution if the null hypothesis involves coefficients 
in the direction of nonstationarity (e.g. Dolado and Lutkepohl Sims, Stock and 
Watson fHUl, Tsay and Tiao [H^). On the other hand, if is cointegrated and the 
rank of cointegration is known a priori, Ahn and Reinsel Jj and Johansen [31, S^l 
using the reduced rank framework proposed by Anderson (3| have shown that the 
coefficients of cointegration vectors are asymptotically mixed normal, hence there 
will be no inference problem. The Wald test statistics constructed from the reduced 
rank regression will again be asymptotically chi-square distributed. This is because 
imposing the reduced rank condition is equivalent to avoid estimating the unit roots 
in the system. 

Unfortunately, as discussed in section 2, prior information on the rank of cointe- 
gration or direction of nonstationarity is usually lacking. One way to deal with it is 
to pretest the data for the presence of cointegration and the rank of cointegration, 
then apply the reduced rank regression of Ahn and Reinsel [l[ or Johansen 31. 32|. 



However, statistic tests for the rank of cointegration have very poor finite sample 
performance (e.g. Stock [I^). The first stage unit root test and second stage coin- 
tegration test can induce substantial size distortion. For instance, Elliott and Stock 
[lOl| consider a bivariate problem in which there is uncertainty about whether the 
regressor has a unit root. In their Monte Carlo simulation they find that unit root 
pretests can induce substantial size distortions in the second-stage test. If the in- 
novations of the regressors and the second-stage regression error are correlated, the 
first-stage Dickey- Fuller Q t-statistic and the second-stage i-statistic will be depen- 
dent so the size of the second stage in this two-stage procedure cannot be controlled, 
even asymptotically. Many other Monte Carlo studies also show that serious size 
and power distortions arise and the number of linearly independent cointegrating 
vectors tend to be overestimated as the dimension of the system increases relative 
to the time dimension (e.g. Ho and Sorensen [isj, Gonzalo and Pitarakis [l^). 

Another way is to correct the miscentcring and skewness of the limiting distri- 
bution of the least squares estimator due to the "endogeneities" of the predeter- 



mined integrated regressors (e.g. Park [4^, Phillips [4j|, Phillips and Hansen [4 



Robinson and Hualde [49|). However, since the rank of cointegration and direction 
of nonstationarity are unknown, Phillips [i^ proposes to deal with potential endo- 
geneities by making a correction of the least squares regression formula that adjusts 
for whatever endogeneities there may be in the predetermined variables that is due 
to their nonstationarity by transforming the dependent variables into 

(3-6) wt = wt- Qy^yjfi;^^,^^ V wt, 

where r^v^^v^ = E^-oo -^(V^^t V m't-j), ^v\7-^ = E^-oo ^fe V m't-j) and 
51"^^^ denotes the Moore-Penrose generalized inverseH Using in place of 
in (2.2) is equivalent to modifying the error term from to y^ — ri^mil"^ y w^, 
which now becomes serially correlated because SJW). is serially correlated. To cor- 
rect for this order (1/T) serial correlation bias term, Phillips suggests fur- 
ther adding {X'X)^^{0,TA^^,^) to the least squares regression estimator of wf 
on \7Wt-i,---:'7Wt-p+i,Wt_p, where A+^^ = n^^^fl-^^^A^^^yj, and A„„ 



^If are cointegrated, ^^jn'syw does not have full rank 
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denotes the one-sided long-run covariances of two sets of /(O) variables {ut,vt), 
^uv = J2'jLo^uvij) where r„^(j) — EU{Vf_j^ Consistent estimates of iluv or Auv 
can be obtained by using Kernel method (e.g. Hannan [15], Priestley [3]). 

T-l 

(3.7) f2™= Kj/K)fuv{3), 

j=-T+l 
T-l 

(3.8) A™ = ^;i(j7if)f™(j), 

3=0 

where r„„(j) is a consistent sample covariance estimator of TuvU), and h{-) is a 
kernel function and -ft" is a lag truncation or bandwidth parameter. Assuming that 

Assumption 3.1. The kernel function h{-) : R — > [—1, 1] is a twice continuously 
differentiable even function with: 

(a) /i(0) = 1, /i'(0) = 0, /i"(0) ^ 0; and either 

(b) h{x) = 0, 1 X |> 1, with lim|j,| ,i (1^1x1)^ ~ constant, or 

(b') h(x) = 0((1 - .x)2), as I X I — > 1. 

Assumption 3.2. The bandwidth parameter K in the kernel estimates (3.7) and 

(3.8) has an expansion rate K ~ ctT^ for some k e (1/4, 2/3) and for some slowly 
varying function ct and thus K/T'^/^+T'^/^ / K — > and K^/^ — > cx) as T — > oo. 

Phillips [45^ shows that the modified least squares estimates are either asymp- 
totically normally distributed or mixed normal. However, because the direction of 
nonstationarity is unknown, the conditional covariance matrix cannot be derived. 
Therefore, if the test statistic involves some of the coefficients of nonstationary 
variables, the limiting distribution becomes a mixture of chi-squares variates with 
the weights between and 1. In other words, if tests based on chi-square distri- 
bution rejects the null with significance level a, then the test rejects the null with 
significance level less than a. In other words, tests based on chi-square distribution 
provides a conservative test. 

Toda and Yamamoto [HH] have suggested a lag-order augmented approach to 
circumscribe the issue of non-standard distributions associated with integrated re- 
gressors by overfitting a VAR with additional dmax lags where c?max denotes the 
maximum order of integration suspected. In our case, dmax = 1. In other words, 
instead of estimating (2.2), we estimate 

(3.9) = niwj_i H h HpWj^p -(- np+iwj_p_i -I- 

Since we know a priori^ IIp+i = 0, we are only interested in the estimates of 
IIj, j = 1, . . . ,p. The limiting distributions of the least squares estimates of (3.9) 
can be derived from the limiting distributions of the least squares estimates of (the 
error-correction form) , 

(3.10) = V '"t-l + ■ • • + n; V wt-p + ^l+iWt-p-i + 

because H* = X^Li n,, j = 1, . . . ,p + 1 or = H* - n*_i where Hj^ = 0. Since 
n* , j = 1, . . . , p are coefficients of stationary regressors. Theorem 3.2 shows that the 



"Under A.3, A„^^ = 0. 
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least squares estimates of H* , j = 1, . . . , p converge to the true values at the speed of 
VT and are asymptotically normally distributed. Only the least squares estimates 
of n*_|_]^ may be T-convergent and have non- normal limiting distributions. However, 
since we know a priori that IIp+i = 0, our interest is only in Ilj, j = 1, . . . ,p. The 
least squares regression of (3.9) yields 11^ = ft* — ftj_i , j = 1, . . . , p, therefore, they 
are asymptotically normally distributed. Wald test statistics of the null hypothesis 
constructed from regression estimates of (3.9) will again be asymptotically chi- 
square distributed. 

Phillips [i^ modified estimator maintains the T-convergence part of the coeffi- 
cients associated with full rank integrated regressors. The Toda-Yamamoto [HHI lag 
order augmented estimator is only VT-convergent. So Phillips [45j modified esti- 
mator is likely to be asymptotically more efficient. However, computationally, the 
Phillips modified estimator is much more complicated than the lag order augmented 
estimator. Moreover, test statistics constructed from the modified estimators can 
only give the bounds of the size of the test because the conditional variance is un- 
known, while test statistics constructed from the lag order augmented estimator 
asymptotically yield the exact size. 



4. Least squares and two stage least squares estimation of SVAR 

For ease of exposition, we assume that prior information is in the form of excluding 
certain variables, both current and lagged, from an equation. Let the g-th equation 
of (2.1) be written as 

(4.1) Wg = Zg5g^eg, 

where Wg and denote the T x 1 vectors of (wgi, . . . , Wqt)' and (e^i, . . . , e^t)', 
respectively, and Zg denotes the T x [{p -\- 1)5a — 1] dimensional matrix of g\ 
included current and lagged variables of . 

The least squares estimator of (4.1) is given by 

(4.2) ig^.s = {Z'gZg)-^Z'gWg 

Phillips and Durlauf [i^ and Stock [HI] have shown that the least squares es- 
timator with integrated regressors is consistent even when the regressors and the 
errors are correlated. However, the basic assumption underlying their result is that 
the regressors are not cointegrated. In a dynamic framework even though Wt-j s-'^s 
/(I), the current and lagged variables are trivially cointegrated. It was shown in 
[2l| when contemporaneous joint dependent variables also appear as explanatory 
variables in (4.1), applying least squares method to (4.1) does not yield consistent 
estimator for 5g. To see this, let Mg be the nonsingular transformation matrix that 
transforms Zg into Z* = ZgMg = {Zgi, Zg2), where Zgi denotes the £g-dimensional 
linearly independent 7(0) variables and Z*2 denotes the T observations of bg full 
rank 7(1) variables^ then 



(4.3) 



Wg = ZgMgMg^6g 



where J* = Mg^Sg — (^31,^^2)' '^ith 5*^ and denoting the x 1 and bg x 1 
vector, respectively. Such transformation always exists. For instance, if no cointe- 
grating relation exists among the included Wj, say Wg^, then bg equals the dimension 



^By full rank 1(1) variables we mean that there is no cointegrating relation among Z 
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of included joint dependent variables, qa, and Z*^ consists of the first differenced 
current and p — 1 lagged included variables, Z*2 is simply the Ty.bg (or T x ^a) 
inchidcd Wg^ lagged by p periods, t-p- the other hand, if there exists g^. — bg 
linearly independent cointcgrating relations among the gA included variables, Wg^ , 
then Z*i consists of the current and p — 1 lagged V^gt ^-^d Wg^-pdg cointcgrating 
relations, where Wg-p isT x g^. matrix of included Wg ^_p, dg is pA X {9a — bg) of 
constants, and Z*2 consists of the T observed bg full rank 7(1) variables Wg2,-p- 

The least squares estimator (4.2) can be written as = MgS^ where 
denotes the least squares estimator of (4.3). Using Theorem 3.1, one can show that 

Q, 



1 /7*' ry* 



1 ly* 



.ee.,);;>0']',E..,„ is the 5-th 



^Z;,eg b, where b = [Eieg,w'g,),0']' = [{A^' ^ 

column of and {Aq^ g)g is the (^a — 1) x 1 subvcctor of A^^ ^ that 
corresponds to the (/a — 1 included variables Wg^ in the g-th equation, and M* 
and M* , are nonsingular. It follows that 

Zg2Zg2 O 



(4.4) 



^^1 

,^2 



+ 



Although the coefficients of Z*2 can be consistently estimated, the coefficients of Z*i 

cannot. Since Sg i^ is a linear combination of 5^^^^ and 5g2,£sj5g/s is inconsistent. 

When the errors and regressors are correlated, a standard procedure is to use 
instrumental variable method. Using lagged variables as instruments, the two stage 
least squares estimator of Sg is given by 



(4.5) 



L2SLS = [Z'X{X'X)-'X'Zg]-'[Z'X{X'X)-'Z'wV 



g V ' g~g' 

where X = {W-i, W^2, ■ ■ ■ , W^p) and W-j denotes the Txm matrix representation 

_j . Transforming X into linearly independent 7(0) and full rank 7(1) processes. 



of y*- 



XI and X2, respectively, by M^, XM^ 
to MgSg 2SLS^ where 



[XI, X2], the 2SLS estimator (4.5) is equal 



(4.6) 



Since ^Z*'X^ 



1 



S~g,2SLS = [Z*g'X*iX*'X*) 

1 V^f V* 

T^^l ^2 * 



-^x*'z*]-^[z*'x*{x*'x*)-^x*'w„ 



Ml 



T^-^2^g 



X1X2 

0, and Ml 



M. 
0, 



1 '7*/ . n 1 y^f "V* 



Ml 



1 v*/^ 



M* , 
0, and 



^X2X2 nonsingular, it follows that Sg 2SLS converges to 6*. 
Hence the 2SLS estimator of dg is consistent. 



Let 77„ 



T—^h 








and 77,; 



T-2h, 







where t and b* are 
A.4, 



the column dimensions of X* and X2 respectively. Under assumptions A.l 
as T — > 00, 



^g ^i^g,2SLS 

(4.7) 



V^(|gl,2SLS 



€2) 



(M* M*^^ M* ) 
(M* M*^^ M* 

\ Zg2X2 X2X2 X2Zg2 

By theorem 3.1, we have 



-\M;^^,^M*-,\-T-'/^Xreg) 

-\m: 



Zg2X2 X2X2 



'X*2'eg) 



(4.8) 



— — X*'f: 



iV(0,a^M,%J. 
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and 
(4.9) 



■2 Eg 



Bx' dBf 



where B^^ denotes the Brownian motion of e^t with variance , B^* denotes a &* x 1 
vector Brownian motion of V52t with covariance matrix ^\yx^\jx* where VL^^^i^/xi 
is the long-run covariance matrix of V'?2f The Brownian motion i3* and B^^ are 
not inde pen dent because e^t and are contemporaneously correlated. Following 
PhilUps 

(4.10) 



44| . we can decompose the right hand side of (4.9) into two terms as 

BM{al 



Bx' ^egSJx^ ^V^2 V^2 '^^^i ' 



where B^^.x* = B, 



,Bx 



^s/x^igj ^-nd rigg^j;. denotes the long-run covariance between 
and V'?2- The first term of (4.10) is a mixed normal. The second term involves a 
matrix unit root distribution that arises from using lagged w as instruments when 
w is 1(1) and the contemporaneous correlation between e^j and is nonzero. The 
"long-run endogeneity" of the nonstationary instruments X2 leads to a skewness 
of the limiting distribution of 5g 2Sls ^^'^ dependence on nuisance parameters 
that are impossible to eliminate by the 2SLS. Therefore, 

Theorem 4.1. Under A.l - A. 4 the 2SLS estimator of is consistent and 

(4.11) Vril.^.sLs - s*,) =^ N{o, ^UK.=^AC;x\M:^.gX'), 



,2SLS 

(4.12) 



"~g2) 



B^'Adrij Bx.B'^,dr)-' J Bx.B'^.Jr 
J B^^^^B',,dr{j Bx^B'^^dry^ 

/Bx^dBf .x* -t- / Bx*^f ^^^L* > , 
■^2 ^9 -^2 / -^2 ^9V-^2 V^2V^2 2 [ ' 



where B^'^ denotes a bg x 1 vector Brownian motion of V'-?^2 t which appears in 
the g-th equation. The distributions of (4-11) and (4-12) are asymptotically inde- 
pendent. 

Theorem 4.1 suggests that inference about the null hypothesis PSg = c can 
be tricky, where P and c are known matrix and vector of proper dimensions. If 



TP{Sg 2SLS ~ ^g) ^ nonsingular covariance matrix, the limiting distribution 
of P5 is determined by the limiting distribution of ^ j^, hence the Wald test statistic 



(4.13) 



(S,.2SLS ~ UP' Gov {PL,2SLSr^P{L.2SLS " 



under the null will be asymptotically chi-square distributed. On the other hand, if 
.2SLS ~ ^g) ^ singular covariance matrix, it means that there exists a 
nonsingular matrix L such that 



(4.14) 



LPSg = LP*s; = 



Pu Pl2 






. P22. 
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with nonzero ^22- Then 



{PL2SLS - S)' Gov {PL2SLSr\PL2SLS - e 



Pll P12 

Q P22 



§gl,2SLS 
§g2,2SLS 



-Lc} Coy (LPS ^,,sLsy 



(4.15) 



-Pll P12 

Q P22 



2SLS 



^g2,2SLS 



Lc 



=> T{Pii5g^ 2SLS + Pl25g2,2SLS - 6l)' Gov {VT PllSgi^2S Ls) ^ 
^ iPll§gl,2SLS + Pl2Sg2^2SLS ~ Si) 

+ T^{P225*g2,2SLS - Gov {TP22tg2.2SLSy^iP225*g2,2SLS " fe)^ 

where Lc = (c^,C2)'. The first term on the right hand side of (4.15) is asymp- 
totically chi-square distributed. The second term, according to Theorem 3.1 has a 
nonstandard distribution. Hence (4.15) is not asymptotically chi-square distributed. 

If there exists prior information that satisfies (2.9) and Wi and 'W2 are cointe- 
grated with X2 contained in u;2, it was shown by Hsiao (2^ that the 2SLS converges 
to a mixed normal distribution. Then the Wald test statistic (4.13) can again be 
approximated by a chi-square distribution. When variables cannot be dichotomized 
into "endogenous" and "exogenous" , if we do not know the direction of nonstation- 
arity, nor the rank of cointegration, we will not be able to know a priori if P22 is a 
zero matrix, hence if (4.13) may be approximated by a chi-square distribution. 



5. Modified and lag order augmented 2SLS estimators 

We note that just like the least squares estimator for the VAR model, the application 
of 2SLS does not provide asymptotically normal or mixed normal estimator because 
of the long-run endogeneities between lagged 1(1) instruments and the (current) 
shocks of the system. But if we can condition on the innovations driving the common 
trends it will allow us to establish the independence between Brownian motion of 
the errors of the conditional system involving the cointegrating relations and the 
innovations driving the common trends. The idea of the modified 2SLS estimator 
is to apply the 2SLS method to the equation conditional on the innovations driving 
the common trends. Unfortunately, the direction of nonstationarity is generally 
unknown. Neither does the identification condition given by Theorem 2.1 requires 
such knowledge. In the event that such knowledge is unavailable, Hsiao and Wang 
[13] propose to generalize Phillips fully modified VAR estimator to the 2SLS 
estimator. 

Rewrite (4.1) as 

Wg - ZgMgM;^5g + eg 

(5.1) ={z;i ^;2)(|i)+g, 

-zT6:; + eg 

where Z;* - Z.M, = {Z;iZ;^),Z;i = {s7Wg,S7Wg,-^,...,SjWg,-p+i),Z;*2 = 
Wg,-p,5*g* = Mg'^5g^\jWg-j dcuotlng the T x stacked first difference of the 
included variable \/Wg and SjWg denoting the T x (g^ — 1) first difference of 
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the included variables \jWg^ excluding SjWgt- The decomposition {Z*l,Z*2) and 
§*g* = (^gi'i5g2')' identical to {Z*i,Z*2) if there is no cointegrating relations 
among Wgf, dg — 0. UnUke {Z*i,Z*2), {Z*l,Z*2) are well defined and observable. 
When Z*i ^ there exists a nonsingular transformation matrix Dg such that 
iZ;*,Z;*)Dg^iZ;„Z*g2). Then 

(5.2) s;^d;'s;\ 

Let 

(5.3) Cg = {WLp V W^p - rAv»v-)f^v-v-"v».. > 

where Vluv and A^^ denote the long-run covariance and the one-sided long-run 
covariance matrix of two sets of 1(0) variables, (y^, Vf), 

oo 

(5.4) a™= J2 

j=-oo 

and 

oo 

(5.5) Auv = y^^Tuvij), 
where r„„(j) — Eu^v^i_y Let 

(5.6) Cg = {WLp V - rAv»v-)f^vw-^v-.« , 

where Q.uv and Au^, are the kernel estimates of VL,,,,, and A„t,, such as (3.7) and (3.8). 
A modified 2SLS estimator following Phillips [45| fully modified VAR estimator can 
be defined as 



t*m2SLS = {Zl*'X**{X**'X**)-^X**'Zl*] 



-1 

9 



(5.7) X \^z*g*'X**{X**'X**)-^ (^xf'w'^-Cg 

where X** - XM, - {Xl\X**),Xl* = (yM^-i, • • • , V^^-p+i), and X** = W_p. 
Just like (.^*i , ^gl), {XI* ,X2*) are well defined and observable. 

Theorem 5.2. Under assumptions A1-A4, 3.1 and 3.2, the modified 2SLS estima- 
tor Sg^^2SLS = Dg^lg,m2SLS Consistent. Furthermore 



(5.8) VT{S^,^^2SLS - =^ NiO, a'g{M:^,,^M;-\M;^,J-') 

and is independent of 



(5.9) 

which is a mixed normal of the form 



m* m::x\ I Bx'dB.x', 



(5.10) / 7V(0,cT^.^,.(M,;^,^M;-iM,*^,^J-i)dP(A4;,J. 



where a'^.x^x* — ^1 -^CgV^a'^V^aVa^J-^Va^Jeg- 
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The modified 2SLS estimator of 5g can be obtained as 

(5-11) ^g,m2SLS — ^'h^g,m2SLS — ^'^g^g^g,m2SLS^ 

where Mg is a known matrix but in general, not Dg. However, although the modified 
2SLS estimator of 5* is cither asymptotically normal or mixed normal, the Wald 
type test statistic 

(5.12) ij(Fa3,„2SL5 - cy{P[Z'gX{X'X)-'X'Zg]P'}-'{Ph,n2SLS - Q) 

does not always have the asymptotic chi-square distribution under the null hypoth- 
esis P5g = c, where P is a known k x gA matrix of rank k. To see this, rewrite 

(5.12) in terms of |*,™2SLS 

^iP*HgSl^,sLs - qY {p*Hg[z;' X*{X*' x*)-'x*' Z;]H'gP*'\ 

(5.13) ^ 

X {P*Hg5g „^2SLS - c), 



where P* ^ PMgDgHg^ and Hg = 



T-iA 



The null hypothesis be- 



comes P*HgS* — c. Notice that the asymptotic covariance matrix of HgSg ,n2SLS 
converges to 



while ilg{Z*gX*{X*' X*)-^X*' Zl\il'g in (5.13) converges to 



0-^ ,(M* M*'^ M* 



^5 14^ ^^im^.x^Ml-lMl^^J'^ 



Wald statistic (5.12) (or equivalently (5.13)) is asymptotically chi-square distributed 
with k degrees of freedom if and only if P^g^rniSLS (^^ equivalently P'^HgSg^^2SLs) 
in the hypothesis does not involve the T'-consistent component m2SLS' 

other- 
wise, Hg[Z*' X*{X*' X*)^^X*' Z*']H'g would overestimate the asymptotic covari- 
ance matrix of Hg5g^ra2SLS because crg.^x* — "'g ^'^^ the submatrix corresponding 
to 52 and z*2. In general, the test statistic (5.12) is a conservative test, with its 
asymptotic distribution a weighted sum of k independent Xi variables with weights 
between and 1. 

The construction of the modified 2SLS estimator requires nonparametric esti- 
mation of the long-run covariance matrix and the one-sided long-run covariance 
matrix. It is well known that kernel estimator and hence the finite sample per- 
formance of the modified 2SLS estimator could be affected substantially by the 
choice of the bandwidth parameter. In addition, since we can not approximate the 
asymptotic covariance matrix of the modified 2SLS estimator properly, Wald test 
statistics based on the modified 2SLS estimator using the formula of (5.12) may 
not be chi-square distributed and critical values that are based on chi-square dis- 
tributions can be used for conservative tests only. However, as noted by Toda and 
Yamamoto jsH!] , if we augment the order of a p-th order autoregressive process by 
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the maximum order of integration then the miscentering and skewness of the hmit- 
ing distribution of the least squares estimator wiU be concentrated on the coefficient 
matrices associated with the augmented lagged vectors which are known a priori to 
be zero, therefore can be ignored. Standard inference procedure can still be applied 
to the coefficients of the first p coefficient matrices. Hsiao and Wang ^] follow this 
idea by proposing a lag augmented 2SLS. 

The p-th order structural VAR (2.1) can be written as a (p+l)-th order structural 
VAR, 

(5.15) ^0^4 + AiWt-i H V ApWt_p + Ap+iWt_p_i = e^, 

where Ap+i = 0. Transforming (5.15) into an error-correction form, we have 
p 

(5.16) J2^*3 + ^l+l^t-p-l = g*. 

where A* = J2e=o^f-'^ ^ 0,1, •■•^P and A*_^_i = A*. It follows that A = [Aq, 
...,Ap]^[Al,...,A;]M-\ 

Let the g-th equation of (5.15) be written as 

(5.17) = Z^Sf + gg, 

where = (Zg, _(p+i)), = i^! g,~a! g,p+iY with denoting the 

T X g/^ vector of included Wg^ lagged by (p + 1) periods and Qg^p+i is the 5-th 
row of Ap^i excluding those elements subject to exclusion restrictions. Just like 
(4.1), there exists a nonsingular transformation matrix that transforms Z^ 

into Z*/ = Z^Mf = iZ;f,Z;^), and = (M^r'Sf = {S;f,S*/y where 

^gi — (V-^S) W^3,-(p+i)Zrg) is stationary and Z*^ — W^g2,-(p+i) consists of T ob- 
served bg linearly independent /(I) variables, Wg2^t-{p+i)- Rewrite (5.17) in terms 
of the transformed variables, 



*A > 



(5.18) Wg = Z^M^{M^)-H4 + eg = {Z;^ Z;^) 

^ '-92 



Let = (X, PF_(p+i)). The 2SLS estimator of (5.17) is defined as 

(5.19) CsLS = [Zf X\X^' X^r'X^' Zfr\zf'x\X^' X^r'X^'wg]. 
The LA2SLS of (4.1) is defined as 

(5-20) §.g,LA2SLS = Qf^g,2SLS^ 

where Qf = {Iip+i)g^^i,Qg^), where 0^^ denotes a [{p + l)gA - 1] x matrix of 
zeros. Since Sg^2SLS = M^Sg^sLS^ we have 

^g,LA2SLS — Q'g^'^g^^g,2SLS 

(5.21) ^ {Mg,QgJsl%LS 

^ (^^9>Qg)lsl,2SLS' 
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where Mg is a [{p + 1)(?a ^ 1] x [{p + 1)5A ^ 1] matrix of the form|f| 



(5.22) 



Mr, 



3A-1 



\ 



0' 



-^SA -^ffA 



V 



with Wgt being put as the last element of Wg^ , Ig^ denoting the identity matrix of the 
dimension of included variables in the g-ih equation, and 0^ is a [{p+ l)gA ^ 1] ^ 
matrix with Vg denoting the number of cointegrating relations among such that 

w'g,^^g is /(O). Then 5^ = {Mg,%)5*gt 
Since 

(5.23) vr(i;^25L5-^;f) 



where M^*^^ = plim ^ Z*/ Xl^ , M^*^^ = plim ^Xj^^'^i*^, with X^*^ = (y^, 
VF_(p+i)d) being the T x (mp + r) linearly independent /(O) variables. It follows 
that 

Theorem 5.3. The LA2SLS of 6„ is consistent and 



(5.14) 



=^N(.0,al{Mg 



A* 



71 — 1 ji/rA* 1—1 



The LA2SLS estimators of the coefficients of the original structural VAR model 
(2.1) converge to the true value at the speed of T^/^ and are asymptotically normally 
distributed with nonsingular covariance matrix. Therefore, Wald type test statistics 
based on LA2SLS estimates are asymptotically chi-square distributed. Compared 
to the conventional 2SLS or modified 2SLS, the LA2SLS estimator loses the T- 
convergence component and ignores the prior restrictions that the coefficients on 
Wg t_(p_|_]^) are zero, hence may lose some efficiency. However, since distribution of 

5g is a linear combination of 6gi and Sg2 and the limiting distribution of Sg la2SLS 
is given by the components of the slower rate of convergence, the loss of efficiency 
in estimating 6g by LA2SLS may not be that significant, as reported in a Monte 
Carlo Study by Hsiao and Wang 



6. Conclusions 

As demonstrated by Nelson and Plosser [4l[ that many economic time series are 
nonstationary. The advancement of nonstationary time series analysis provides a 
rich reportoire of analytic tools for economists to analyze how do variables respond 
dynamically to shocks through the decomposition a dynamic system into long-run 
and short-run relations and allow economists to extract common stochastic trends 
present in the system that provide information on the important sources of economic 
fluctuation (e.g. Banerjee, Dolado, Galbraith and Hendry King, Plosser, Stock 
and Watson [33|). However nonstationarity does not invalid the main concerns of 

*For ease of notation, we assume all the included variables appear with the same lag order. 
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Cowles Commission structural approach — identification and simultaneity bias. As 
shown by Hsiao (26j , whether the data is stationary or nonstationary, the same 
rank condition holds for the identification of an equation in a system. Ignoring 
the correlations between the regressors and the errors of the equation that arise 
from the joint dependency of economic variables can lead to severe bias in the least 
squares estimator even though the regressors are 1(1) (Hsiao [21], also see the Monte 
Carlo study by Hsiao and Wang [28[). Instrumental variable methods have to be 
applied to obtain consistency. 

However, nonstationarity does raise the issue of statistical inference. Standard 
instrumental variable method can lead to estimators that have non-normal asymp- 
totic distributions and are asymptotically biased and skewed. If there exists prior 
knowledge to dichotomize the set of variables into joint dependent and exogenous 
variables and the nonstationarity in the dependent variables is driven by the nonsta- 
tionarity in the exogenous variables through cointegration relations, standard 2SLS 
developed for the stationary data can also be used for the analysis of nonstationary 



data (Hsiao [2l|, [24I). Wald test statistics for the null are asymptotically chi-square 
distributed. There is no inference issue. On the other hand, if all the variables are 
treated as joint dependent as in the time series context, although 2SLS is consistent, 
the limiting distribution is subject to miscentering and skewness associated with 
the unit root distribution. Modified or lag order augmented 2SLS will have to be 
used to ensure valid inference. The modified 2SLS is asymptotically more eflacient. 
However, it also suffers more size distortion in finite sample. On the other hand, 
the lag order augmented 2SLS does not suffer much efficiency loss, at least in a 
small scale SVAR model (e.g. Hsiao and Wang [2§|), and chi-square distribution is 
a good approximation for the test statistic. 

All above discussions were based on the assumption that no knowledge of coin- 
tegration or direction of nonstationarity is known a priori. If such information is 
available, (e.g. King, Plosser, Stock and Watson [ssj) estimators incorporating the 
knowledge of the rank of cointegration presumably will not only lead to efficient 
estimators of structural form parameters, but also avoid the inference issues arising 
from the matrix unit roots distrubutions in the system. Unfortunately, structural 
form estimation methods incorporating reduced rank restrictions appear to be fairly 
complicated. 

The focus of this review is to take a SVAR model as a maintained hypothesis, 
search for better estimators and understand their properties. We have not looked 
at the issues of modeling strategy. There is a vast literature on the interactions 
between structural and non-structural time series analysis to uncover the data- 
generation process, including testing, estimation, model-combining and prediction 
(e.g. Hei idry and Ericsson [l3|, Hendry and Krolzig [lj|. King, Plosser, Stock and 
Watson 0, Zellner and Palm [eH). 
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